Training Neural Networks for Protein Secondary Structure Prediction: The Effects of Imbalanced Data Set
نویسندگان
چکیده
Protein secondary structure prediction (PSSP) is one of the main tasks in computational biology. During the last few decades, much effort has been made towards solving this problem, with various approaches, mainly artificial neural networks (ANN). Generally, in order to predict the protein secondary structure, the ANN training process is performed using CB513 data set. Like protein structures databases, this data set is imbalanced and it can cause a low error rate for the majority class and an undesirable error rate for the minority class. In this paper we evaluate the effects of an imbalanced data set in training and learning of neural networks when they are applied to predict protein secondary structure. For this we applied resampling methods to tackle the imbalance class problem. Results show that imbalanced data sets decrease the helixes predictions rates. Although, protein data set distribution does not affect significantly the global accuracy (Q3).
منابع مشابه
Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches
DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...
متن کاملThe Prediction of Surface Tension of Ternary Mixtures at Different Temperatures Using Artificial Neural Networks
In this work, artificial neural network (ANN) has been employed to propose a practical model for predicting the surface tension of multi-component mixtures. In order to develop a reliable model based on the ANN, a comprehensive experimental data set including 15 ternary liquid mixtures at different temperatures was employed. These systems consist of 777 data points generally containing hydrocar...
متن کاملNyquist Plots Prediction Using Neural Networks in Corrosion Inhibition of Steel by Schiff Base
The corrosion inhibition effect of N,N′-bis(n-Hydroxybenzaldehyde)-1,3-Propandiimine on mild steel has been investigated in 1 M HCl using electrochemical impedance spectroscopy. A predictive model was presented for Nyquist plots using an artificial neural network. The proposed model predicted the imaginary impedance based on the real part of the impedance as a function of time. The model to...
متن کاملPREDICTION OF COMPRESSIVE STRENGTH AND DURABILITY OF HIGH PERFORMANCE CONCRETE BY ARTIFICIAL NEURAL NETWORKS
Neural networks have recently been widely used to model some of the human activities in many areas of civil engineering applications. In the present paper, artificial neural networks (ANN) for predicting compressive strength of cubes and durability of concrete containing metakaolin with fly ash and silica fume with fly ash are developed at the age of 3, 7, 28, 56 and 90 days. For building these...
متن کاملMulti-Step-Ahead Prediction of Stock Price Using a New Architecture of Neural Networks
Modelling and forecasting Stock market is a challenging task for economists and engineers since it has a dynamic structure and nonlinear characteristic. This nonlinearity affects the efficiency of the price characteristics. Using an Artificial Neural Network (ANN) is a proper way to model this nonlinearity and it has been used successfully in one-step-ahead and multi-step-ahead prediction of di...
متن کامل